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Single case designs are a set of research methods for evaluating treatment effects by assigning different 
treatments to the same individual and measuring outcomes over time and are used across fields such as 
behavior analysis, clinical psychology, special education, and medicine. Emerging standards for single case 
designs have focused attention on the need for effect sizes for summarizing and meta-analyzing findings 
from the designs; although many effect size measures have been proposed, there is little consensus 
regarding their use. This article proposes a new effect size measure for single case research that is directly 
comparable with the standardized mean difference (Cohen's d) often used in between-subjects designs. 
Techniques are provided for estimating the new effect size, as well as its variance, from balanced or 
unbalanced treatment reversal designs. The proposed estimation methods are further evaluated using a 
simulation study and then demonstrated in two applications. Copyright © 2012 John Wiley & Sons, Ltd. 
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Single case designs are distinguished by the fact that they assign different treatments to the same individual and 
measure an outcome over time. Such designs are a special type of the broader category of repeated measures 
designs. These designs are widely used in behavior analysis, clinical psychology, and special education, and 
sometimes used in medicine. In principle, single case designs permit the detection of treatment effects from a 
study that involves a single case and a comparison between two periods (one that is a baseline or control period 
and another period in which a treatment is assigned to that case). In practice, studies using single case designs 
usually involve more than one period in each treatment condition and replications of the design with more than 
one case. Emerging standards for single case designs emphasize at least two periods of each treatment (leading to 
three treatment contrasts or reversals) and replications across several individuals. [15] 

Evaluation of the results of single case designs involves the search for functional relations between treatment 
assignment and an outcome. That is, in the ideal, the study is designed so that each treatment (or baseline) phase 
is continued for enough measurements that the pattern of outcome values is clearly established. To establish 
functional relations, researchers prefer stability within treatment phases, with treatment effects conceived as 
differences in these stable patterns between treatment and control phases. Stability, however, can be 
conceptualized in many different ways. For example, the pattern could be one of fluctuation around a constant 
value with a common mean within a phase and a common residual variance within all phases. The pattern could 
also involve systematic increase or decrease across measurements in a phase, such as a linear or quadratic trend, 
and a common residual variance within phases. Alternatively, the pattern could include a constant mean or a 
trend over measurements accompanied by larger or smaller residual variation around the trend, depending on 
whether the treatment is in effect. From this perspective, functional relations between treatment and outcome 
(what one would call treatment effects in between-subjects designs) are understood to be contrasts between 
the stable states established within treatment phases. 

Statistical analysis is not always used in assessing treatment effects in single cases designs. One reason for this 
is that repeated measurements from the same individual often cannot be considered independent (i.e. they 
involve an autocorrelation structure), which complicates statistical analysis. Another reason is that stability within 
phases may take many forms, each of which requires a somewhat different statistical analysis to evaluate 
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comparisons among phases. Still, once a particular conception of stability is posited, there is no principled reason 
that statistical methods (including hypothesis tests) cannot be used in the analysis of treatment effects in single 
case designs. For a review of statistical methods for such analyses, see Kratochwill and Levin. [16] 

Treatment effects are quantified using measures called effect sizes, which are useful for a variety of reasons. 
Effect size measures are often used as a supplement to hypothesis tests and as a way to quantitatively summarize 
study results (see [25]). Effect size measures also have an important function in making more comparable the 
results of studies using different designs and outcome measures (see, e.g. [1 2,24]). In one approach to systematic 
research review, effect sizes provide the basis of formal quantitative syntheses (see, e.g. [7]). There is substantial 
consensus on methods for computing effect sizes in between-subjects designs, and many 'standard' effect size 
measures are well known among researchers (e.g. [5,8]). In contrast, there is much less consensus on methods 
for computing effect sizes in single case designs (see, e.g. [2,18,20,22,4]). 

One reason that defining effect sizes in single case design is difficult is that they represent differences between 
stable patterns in different phases. Because there are many different kinds of stability, it is difficult to propose an 
effect size measure that is equally appropriate for expressing differences between phases for all of them. We 
believe that some progress can be made by focusing on specific types of stable data patterns and defining effect 
size measures that express the differences between phases for stable patterns of a single, given type. 

The purpose of the present paper is to propose a new effect size measure and corresponding estimation 
techniques for single case designs. The focus is on a single type of stable pattern, arguably the simplest type: 
fluctuation around a constant value (a common mean with a common residual variance within phases). We offer 
a statistical model in which the effect size parameter corresponds to the standardized mean difference (Cohen's 
d), a well-known effect size parameter in between-subjects designs. Our effect size measure thus has the virtue of 
expressing the treatment effect from single case designs on the same metric as that often used in between- 
subjects designs. We propose an estimator for this effect size, derive its approximate sampling distribution 
(including expressions for the mean and variance), and evaluate the accuracy of the analytic expressions for the 
mean and variance of the estimator. The initial exposition describes the effect size in the context of a two-phase 
(AB) 1 design with equal numbers of time points within each phase, where A indicates one phase (e.g. baseline), B 
indicates a second phase (e.g. treatment), and the superscript indicates the number of AB sequences in the study. 
In a later section, we generalize the results to designs with 2k phases, so-called (AB)* designs, in which each 
individual may have an unequal number of time points in different phases. 


1. The balanced (AB) 1 design with n observations in each phase 


1.1. Model 


We begin by positing a structural model for the data collected from an AB design. This model is broad enough to 
encompass both a between-subjects experiment and a single case design with replications across individuals, 
making it possible to identify a parameter that is a conventional effect size (the standardized mean difference 
or d-index) in the between-subjects design. We then show that it is possible to estimate the same parameter using 
data collected with a single case design. 

To simplify the initial exposition, consider a two-period (AB) 1 design. Let be the / h observation from the / th 
individual, where there are m > 2 individuals, and for each individual, suppose that the first n observations are in 
the baseline period, followed by n observations in the treatment period. Thus, the data are denoted Y,yfor 
and j= 1, . . ., 2n. We can describe the entire data layout as follows. 

Control Treatment 


Yu, 

Yn, 

* * * 9 

Y\n, 

Yn, 

Y 22, 


Y ln , 

. . 

Y m2 , 

* * * 9 

v 

± mn > > 



There is a between-groups experiment with total sample size m embedded within this data structure. Suppose 
that m q cases are randomly assigned to receive the treatment condition, the remaining m 0 cases receive the 
control condition, and a single observation is made from each case at time j= 1. We can adjust the numbering 
of cases so that the first m 0 cases (/ = 1 , . . ., m 0 ) are the control condition and the second set of m q cases (/ = m 0 + 
1, . . ., m) are in the treatment condition. This is shown by the two boxes in the illustration that follows. Note that 
because there is only one observation per case and it is reasonable to assume that cases are independent, the 
data should meet the usual assumptions invoked in the analysis of a between-subjects design. Further, note that 
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we have chosen one subset of the data that yielded a between-subjects experiment, but this is only one of many 
possibilities. For example, we might choose the first observation in each period, or the second, or the third, and so 
on. Similarly, we might choose to use the first observation in both periods, or we might choose the first in the 
control period but the second (or the third) in the treatment period, and so on. 



We posit the following stochastic model for such data: (1 ) the Vy are normally distributed, (2) the data series for 
each case lacks any time trend, and (3) within each case, the deviations from the mean at each point in time are 
weakly stationary, with first-order autocorrelation <j>- Specifically, the statistical model for the first period is 

Yu = n c + rii + Bij, 

and the statistical model for the second period is 

Yij = / + Vi + Eij, i = 1 , . . . , m;j = n + 1 . . . . , 2 n 

where -\f = A represents the shift between baseline and treatment periods. We assume that the individual 
level effects ?/,• are normally distributed with zero mean and variance t 2 . The assumption that each time series 
is weakly stationary implies that, conditional on the r the covariance of V© with V', (/+ 1) depends only on f. We 
assume further that the a,y have variance a 2 and first-order autocorrelation </> within individuals, so that 

Cov{e,y,8 /y } - jf j=r ■ 

The 2 n x 2 n covariance matrix of the errors within individuals therefore has the form 

/ a 2 <t>a 2 ... <£ 2 "-V\ 

0f7 2 f7 2 ... <£ 2 "-V 

V0 2n ~V (j> 2n - 2 a 2 ...a 2 


2. The effect size parameter 


Because the variance of observations within cases is a 2 and the variance of observations between cases is t 2 , the 
total variance of each observation is ct 2 + t 2 . Under this model, define the effect size parameter 


\/f7 2 + T 2 


(D 


This definition of the effect size is precisely the standardized mean difference (Cohen's d-index) that is widely used 
in between-subjects experiments. Note that the variance here consists of a within-person component a 2 and a 
between-person component r 2 that are not separable in a between-subjects design. Thus, the variance estimated 
in the between subjects design is ct 2 + t 2 . Because the effect size parameter is the same in either the single case 
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design or a corresponding between-subjects design, estimates of this parameter will be on the same scale, and 
thus comparable in magnitude, whether they are based on data from a single case design or a between-subjects 
design. 


3. Estimation of effect size 

There are several possible approaches to estimation of < 5 . The approach used in this paper is to compute a pooled 
(across cases) estimate of — fi c ) and a pooled (across cases) estimate of (ct 2 + t 2 ) and to combine those 
estimates to obtain an estimate of < 5 . 

Other approaches to the estimation of (5 are certainly possible. One approach is to estimate {// — fi c ), a, and 
t individually (e.g. using maximum likelihood) and combine the estimates to obtain an estimate of < 5 . This 
approach has the disadvantage that the small sample properties of the combined estimate of <5 are difficult 
to obtain analytically; consequently, there is no obvious method to determine the bias of the resulting 
estimate, so that it can be reduced or eliminated. A second approach is to note that comparing single points 
in time in the control and treatment phases (the implicit between-subjects study) can yield an estimator that 
is a conventional standardized mean difference, one whose sampling distribution is known. A disadvantage of 
that estimator is that it throws away most of the data. A third approach is to average a series of estimates 
using the second approach (e.g. one for each time point). This has the advantages that it uses all the data 
and has known (but rather complicated) multivariate f-distribution theory, but it has the disadvantages that 
the average has a very complex distribution. Also, we found in preliminary, unreported simulation work that 
this estimator is much less efficient than others; that is, its sampling distribution has larger variance than other 
approaches. 

The approach presented in this paper is to compute pooled estimates of (//— f.f) and (ct 2 + t 2 ) and to use these 
estimates to obtain an estimate of < 5 . This is similar to approaches two and three. It has the advantage of providing 
direct estimates of the numerator and denominator of < 5 . Moreover, the numerator has a normal distribution, and 
the denominator is approximately the square root of a random variable having a chi-squared distribution; the ratio 
therefore has a distribution that is approximately proportional to a non-central f-distribution, just as in the case of 
between-subjects experiments. This approach also has a less obvious advantage. The bias and to some extent 
variance of the effect size estimate depend on the effective number of degrees of freedom in the denominator 
(more is better). Whereas approaches two and three have a denominator with m — 1 degrees of freedom (one less 
than the number of individuals), the approach used here has a denominator whose sampling distribution has 
more than m — 1 degrees of freedom and thus has improved sampling properties. Some rather extensive 
simulation studies confirmed that the approach used in this paper has more desirable sampling properties (less 
bias, more accurate variance approximations, and smaller mean squared error) than the three alternatives 
mentioned earlier. 

Define the effect size estimate ES via 



( 2 ) 


where 



(3) 



(4) 


and Y. t is the mean across individuals at the f th time point given by 


m 





It follows that D is an unbiased estimator of (/( 7 — fi c ), and S 2 is an unbiased estimate of the total variance 

(t7 2 + t 2 ). 


Under this model, the variance of D is 



(5) 


where b p and c p are functions of the autocorrelation <j> and the phase length n, defined in general as 
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1 n n 


1 

b 

n 


2 

n* 


n - 1 




( 6 ) 


and 


c p 


iEX> p|n+M = 

S=1 f=1 


1 

f? 


n— 1 


E 0 p(n+o («-ifi)- 


(7) 


Here, we define b p and c p in general because, although the variance of D depends only on b, and c q (that is b p and 
c p for p= 1), we will need b 2 and c 2 in expressions for the variance of S 2 . Under the aforementioned model, the 
variance of S 2 is 


V{S 2 } 


[(£>2 + C 2 )(1 - p) 2 + 2(bi + Cl )p( 1 - p) + 2p 2 j (ff 2 + T 2 ) 2 
m- 1 


( 8 ) 


where 


P = 



(9) 


is a kind of intraclass correlation that represents the between-person variance t 2 as a fraction of the total variance 
(t 2 + ff 2 ). 

Using a theorem in Box [6] on the distribution of quadratic forms in normal variables, it follows that the 
sampling distribution of S 2 is approximately a chi-squared with v degrees of freedom, where v is given by 


(£>2 + c 2)0 ~ P ) 2 + 2(£>i + Ci )p(1 — p) + 2 p 2 


( 10 ) 


Therefore, ES is a constant 0 times a random variable with the non-central f-distribution with v degrees of 
freedom, where 6 is given by 


0 = 


V{D} _ /2(b, -c,)(1 -p) 


m 


( 11 ) 


It follows from the results in Hedges [9] that the bias in ES can be corrected by multiplying ES by the factor 

3 


J(v) = 1 


4v — 1 


so that the effect size 


G = J(v)ES 


is approximately an unbiased estimator of <5. 

It also follows that the variance of G is approximately 


V{G} = J(v) 2 


V0 2 


v — 2 


v - 2 j( v )‘ 


A slightly simpler asymptotic approximation of the variance is 


V a {G} = J(v) 2 


e 2 +r- 

2v 


( 12 ) 


(13) 


(14) 


(15) 


Note that the expressions (13) for G and (14) and (15) for its variance involve the nuisance parameters (j> and p. In 
application, (14) or (15) will need to be evaluated using the estimate G in place of <5, along with estimates for the 
nuisance parameters. 


4. Accuracy of the approximate sampling distribution 

We investigated the sampling distribution of the estimator under two conditions: first, when the nuisance 
parameters 4> and p were known and second, when the nuisance parameters were estimated. For each part, 
we limited consideration to the (AB) 1 design because this is the simplest design, although it has the essential 
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features of all (AB)^ designs. Designs with more periods should have larger sample sizes; we would therefore 
expect the estimator to perform no worse (and probably better) in these designs. For each analysis, we varied 
<p and p over their entire parameter spaces. We considered values of n and m that appear to be representative 
of values found in the single case literature, based on a recent survey by Shadish and Sullivan. [23] Table 1 reports 
the parameter values and sample sizes used in the analyses. 

First, we considered the bias of G and the accuracy of the expressions for its variance if known values of the 
nuisance parameters are used. For known <p and p, analytic expressions are available for the moments of G; we 
provide a derivation in Appendix A. Using these to evaluate the bias of G, we found it to be small, as expected. 
For example, when m = 4 and n = 4, the estimated bias never exceeds 0.04<5 (relative bias of 4%). For more 
moderate values of 0<p<0.4 and — 0.5<</><0.5, the bias is always less than 0.02<5. 

The variance (14) is quite accurate when the intraclass correlation p is small to moderate (say below 0.3), but it 
tends to overestimate the variance somewhat as p becomes large, a tendency that is exaggerated when there is 
a large negative or positive autocorrelation. The simpler variance estimate (15) tends to underestimate the 
variance, often substantially. A typical result is shown in Fig. 1, which plots the estimated relative variance (the 
average value of the variance estimate divided by the exact value of the variance) on the vertical axis against <p 
on the horizontal axis. The effect size is fixed at <5 = 0.4, and separate curves are plotted for p = 0.2, 0.4, and 0.6; 
each of the panels in the figure corresponds to a different combination of values of m and n. In this figure, all 
of the curves above the relative variance of 1.0 correspond to the variance estimate (14), whereas those below 
a relative bias of 1 .0 correspond to the simpler variance estimate (15). Note that the vertical scale of these graphs 
varies depending on the value of m. 

To determine the properties of the effect size estimate G and its variance when the nuisance parameters are 
estimated, we carried out a simulation study. The simulation involved five parameters, the levels of which are 
reported in Table 1. The number of cases, m, was varied from 4 to 12 to capture a range of values observed in 
practice. The number of observations per period, n, assumed to be equal in the baseline and treatment periods, 
was varied from 4 to 12. The average treatment effect may range greatly depending on the treatment under study; 
we therefore examined a wide range of levels. We varied the autocorrelation <p and the intraclass correlation p over 
all but the most extreme possible values. The total variance t 2 + o 2 was fixed equal to one. For each combination of 
the parameters <5, <p, p, m, and n, we simulated 8000 iterations of the model. Rather than attempting to summarize 
the results across the entire set of simulation parameters, we report results for selected margins. Full simulation 
results, as well as the R code used to generate the simulations, are available from the second author. 

For purposes of simulation, we used the simplest estimators of <p and p, the method-of-moments (sometimes 
called the Yule-Walker) estimators, with simple bias-reducing corrections; see Appendix C for details. These 
estimators are known to be biased, even after correction, when computed from single short time series (see, 
e.g. [14]). Other estimators are available from the literature on single time series, but the generalization of these 
estimators to unbalanced multiple time series is not straightforward, nor is it obvious whether, or how much, 
these generalizations might improve on simpler estimators of <p and p. 

Our simulation studies when (p and p are estimated from the data confirmed that the bias of G remains small, 
except in the case of very large (and probably unrealistic) negative autocorrelations (e.g. cp = — 0.9). When 
| tp | <0.5 and p< 0.5, the relative bias of G is always less than 3% in absolute value. It appears that the variance 
of G is estimated more poorly when <p and p are estimated from the data, regardless of whether (14) or (15) is 
used. Figure 2 provides an illustration; it is constructed just as Fig. 1, plotting the relative variance (the average 
estimated variance divided by the true variance) as a function of (p for various values of m, n, and p. It is likely that 
improved estimation of <p and p may yield improved variance estimation; this remains a topic for future 
investigation. 


5. The (AB) fc design with unequal numbers of observations in each phase for 
each individual 

Suppose that a study uses an (AB) k design, so that there are 2k phases, and m cases. We allow that each individual 
case may have a different number of observations in each phase. Let nf, a = 1 , . . ., 2k, / = 1 , . . ., m be the number of 
observations in the a th phase for the /' th individual, and define n° = 0 for all /= 1, . . ., m. 


Table 1 . Simulation parameter values 

Parameter 

Definition 

Number of levels 

Minimum 

Step 

Maximum 

m 

Number of cases 

3 

4 

4 

12 

n 

Observations per period 

3 

4 

4 

12 

<5 

Effect size 

4 

0.0 

0.4 

1.2 


Autocorrelation 

10 

-0.9 

0.2 

0.9 

T 

t 2 /( t 2 + rr 2 ) 

5 

0.0 

0.2 

0.8 
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p 

— 0.2 
0.4 
0.6 


Expression 



(14) 

(15) 


Figure 1. Ratio of variance computed from (14) and (15) to the exact variance of G when 4>, p, and <5 = 0.4 are known. 



Figure 2. Ratio of the variance computed from (14) and (15) to the simulated variance of G when 4> and p are estimated, for <5 = 0.4. 


Because the number of observations within phases is not the same across individuals, we need a notation that 
can denote the first, second, and so on observation within each phase for each individual. Define the total number 
of observations (/-values) for the /' th individual through the o th phase to be Nf, so that 

A I- =n° + n] + • + nf 

and define Nl to be the total number of observations (the sum of the nf) so that 
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w : = E w ? = EE<' 


The fl th phase for the / th individual includes the ;'-values between N° 1 + 1 and N° 1 +nf = A/f inclusive. Thus, the 
design consists of observations 


Yj,i = 1 , ■ ■ • , m;j = A/, a 1 + 1 , . . . , A/, a 1 


-nr 


for phases a= 1, .. .,2k. 

As in the case of the (AB) 1 design, we posit a stochastic model in which the Yg are normally distributed, the data 
series for each individual lacks any time trend, and within individual, the deviations from the mean at each point in 
time are weakly stationary with first-order autocorrelation <j>. Specifically, the statistical model for the o th phase is 


Y 'i~2 '] A C + j I 1 + ( -1 ) a ]A r + T h + E i 


for / = 1 , .... m and j= A/, 0-1 + 1, . . ., A/, 0_1 +nf. The expressions in square brackets just assure that, in odd- 
numbered phases (baseline phases), the coefficient of /.f is one and the coefficient of // is zero and that in 
even-numbered phases (treatment phases), the coefficient of // is one and the coefficient of fi c is zero, where 
//— /i C = A represents the shift between baseline and treatment periods. We assume that the individual level 
effects r]j are normally distributed with zero mean and variance t 2 . The assumption that each time series is weakly 
stationary implies that, conditional on the ??/, the covariance of Yj with Y i( j +t] depends only on t. As before, we 
assume that the e,y have variance a 2 and first-order autocorrelation (j> within individuals, so that 

r r | ( 0 if / ± T 

Cov{ s,j,er/}s^ H(r2 if ;=# , . 


The denominator of the effect size estimate is based on the variance across individuals for each time point, 
averaged over time points. Because each individual can have a different number of observations within each 
phase, there may not be a complete set of m observations for some time points. The contribution to the variance 
for a time point is computed only if there is an observation at that time point for every individual. Define the 
minimum number of observations for any individual in the o th phase by M a , so that 

M a = Minimum{r)°, . . . ,n a m } (16) 

and define M m to be the sum of the M a , so that 

M* = M ] + ■■■ +M 2k . (17) 

The variance pooled across phases and across individuals is 


S 2 


1 

M‘(m 


2k 



N°-'+M° m 

E EOw-E 


y=N“-’+ 1 i'=i 


where Y.j is the average across individuals of the / h observations, given by 


(18) 


m 

Y.j E Y i- 

‘ raf-' ' 


Note that when k= 1, nf = n for /= 1, .... m, and a =1,2, it follows that M’ = 2n and expression (18) reduces to 
expression (4) for the pooled variance. The numerator of the effect size is the unweighted mean difference 
between phases A and B, defined as 


t m k / , m k / , Nf° Nf 1 

E r.-gt, E r, 

/=1 0=1 \ < j=N 2a ~'+ 1 ' 


' j=N 2 °- 2 +-\ 


Define the effect size estimate ES exactly as in (1), namely, 


where D is given in (19) and S 2 is given in (18). 
Define the auxiliary constant A via 


(19) 


( 20 ) 
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« m 2k 2k 


and the auxiliary constants B, C, and D via 


MH-i) 


C =EEE E E 

1=1 0=1 b=l 5 = 1 1=1 


1 N -° 

N b \ 


1 E 

E H 

(21) 

s=N“- , + 1 

t=N b -'+ 1 J 


M b 



f=1 

-A/f-'+s— 1| 

(22) 

M b 

-Nf-'+s— 1| 


f=l 

(23) 


and 


2k 2k M° M b / m x 2 

IWf-'-Wf-'+s-ll 


D =EEEE E 

a=1 b= 1 s=1 f=1 \/=1 


(24) 


The expected value of D is p r — p c , and the variance of D is 


i Act 2 
V{D} =— . 
1 J m 2 


(25) 


The expected value of 5 is a + r , and the variance of S is 


2(ff 2 + T 2 ) 

1) 


(AT)V + 2p(1-p) - 


B\ (1 — p) 2 f //n — 2\ D 


ml m — 1 \ m 


m ' 


(26) 


where p is the intraclass correlation defined in (9). Proof of these facts is given in Appendix B. 

If D and S 2 are independent, it follows by Box's [6] theorem that the sampling distribution of ES is a constant 0 
times a non-central f-distribution with v degrees of freedom, where 6 is given by 


0 = 



y/A(1-p) 

m 


(27) 


and v is given by 

(M’) 2 (m — I) 2 

( M-) 2 (m - 1)p 2 + 2p(1 - p)(^)B+ (1 - P) 2 [(^)C + J\ ’ 


(28) 


It follows from the results in Hedges [9] that the bias in ES can be corrected by multiplying ES by the correction 
factor J(v) defined in (11), so that the effect size 


G = J(v)ES (29) 

is approximately an unbiased estimator of It also follows that the variance of G is approximately 


V{G} = J(v) 2 


V0 2 
v- 2 


+ <5 2 



(30) 


As a caveat, it should be noted that independence of D and S 2 might not hold in unbalanced (AB)* designs. Still, 
unless the degree of imbalance is severe, these approximations should remain fairly accurate. 

Note that when k= 1 and rif = n, for /= 1, . . ., m; a =1,2, so that we have an (AB) 1 design, the auxiliary constant 
A = 2m(bi - c,), where fc> n and c q are as previously defined. Also, note that under these restrictions, B = 2mn 2 (fa 1 +c q ), 
C=2mn 2 (b 2 + c 2 ), and D = 2m 2 n 2 (b 2 + c 2 ), where b 2 and c 2 are as previously defined. Thus, when both phases have 
equal numbers of observations for all individuals, the expressions for the effect size and its variance in the unbalanced 
(AB) k design reduce to the corresponding expressions given earlier for the (AB) 1 design. 


6. Example 1 

Here, we analyze data from an unbalanced (AB) 2 design reported by Lambert ef al. [17]. This design has four 
phases: a baseline (control) phase followed by a treatment phase, another baseline phase, and another treatment 
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phase. Note that there are m = 9 cases, with each case having between 6 and 10 observations in each baseline 
phase and between 4 and 1 1 observations in each treatment phase. The minimum number of time points for 
the first phase is /W 1 =6, and the minimum numbers of time points for the second through fourth phases are 
M 2 = 4 , M 3 = 6, and M 4 = 7, respectively, so that M m = 23. Approximately 1 0% of the data are missing, with missing 
observations scattered intermittently across cases and phases. For present purposes, missing data are ignored in 
the computations provided in the succeeding texts; a more thorough analysis of these data would use slight 
modifications to our methods to account for the missing data points. This example illustrates the computation 
of the effect size and its variance in an (AB) 2 design that is close to balanced, although not perfectly. The example 
was chosen because it has a relatively large number of replications across cases (m = 9). 

The weighted average difference between phases is D= —5.458, S 2 = 4.674, and S = 2.1 62. The estimate of effect 
size (before bias correction) is therefore 


ES = 


-5.458 

2.162 


-2.525. 


This estimate can be used to describe study results in the same metric as the standardized mean difference of a 
between-subjects study. It can also be combined with estimates from other studies in meta-analysis, both 
estimates from other single case designs and from between-subjects studies on the same question. 

The results of this paper involving the autocorrelation structure can be used to calculate a correction for the 
estimation bias of ES as well as an estimate of effect size variance. Because the number of cases is relatively large 
for single case designs, the bias correction has little effect; however, this will not be true for many single case 
designs where the number of replications across cases is small. Following the method described in Appendix C, 
we estimate that the autocorrelation in these data is cp =0.225. Following the method described in Appendix C, 
we obtain an estimate of the within-case variation <x 2 =4.534; combining this with S 2 = 4.674, we obtain an 
estimate of 0.030 for the intraclass correlation p. Using the autocorrelation (p = 0.225, the values of the auxiliary 
constants are A = 1.754, 8 = 294.751, C= 223.488, and D = 2002.444. Inserting the value p = 0.030 and the values 
of the auxiliary constants B, C, and D into (28), we obtain v = 1 64.492 degrees of freedom. This value of the degrees 
of freedom permits us to compute G, the bias-corrected estimate of effect size. Using ES = —2.525 and inserting 
164.492 degrees of freedom into (29), we obtain G=— 2.513. Inserting the value of the auxiliary constant A and 
p = 0.030 into (27), we obtain the value 0 = 0.145. Finally, inserting the values v, 0, and 8 = G=— 2.513 into (30), 
we obtain V{G} = 0.041. 

Because the estimation of the variance and bias correction involves estimation of the nuisance parameters cp 
and p, it is useful to see how plausible variation in the estimates of these parameters might affect the bias 
correction and the variance estimates. The large-sample variance of the maximum likelihood estimator of tp is 
0.004, which corresponds to a standard error of 0.060; a range of plausible values (a range of two standard errors 
around the estimate) for cp is therefore about 0.10-0.35. Holding fixed the values of ES, S 2 , and the within-case, 
within-phase sample variance y*(0) (defined in Appendix C), we now examine how varying cp affects our estimates 
of v, G, and V{G}. The estimated degrees of freedom v ranges from 1 52.6 to 1 64.5. This variation has only a trivial 
impact on the effect size estimate: G ranges only from —2.512 to —2.513. The impact on the variance estimate is 
also minor, with V{G} ranging from 0.037 to 0.047. 


7. Example 2 


Here, we analyze data from an unbalanced (AB) 2 design reported in Anglesea et at. [3]. This design has four phases: 
a baseline (control) phase followed by a treatment phase, another baseline phase, and a final treatment phase. 
Note that there are m = 3 cases with n] =7, n 2 = 6, n 3 = 7, n\ = 7 for the first case; n\ = 4, n 2 = 4, 02 = 3, 02 = 3 for 
the second case; and o] = 4, 03 = 4, ol = 4, 02 = 2 for the third case. Here, the minimum number of time points 
for the first phase is M 1 =4; the minimum number of time points for the second phase is M 2 = 4; the minimum 
number of time points for the third phase is M 3 = 3; and the minimum number of time points for the fourth phase 
is M 4 = 2; so that M m = 13. This example was chosen because it is extreme, with just enough replications across 
cases (m = 3) to permit estimation of a variance. 

The weighted average difference between phases is D = 86.870, S 2 = 2347.8, and S = 48.455. The estimate of 
effect size (before bias correction) is therefore 


86.870 

48.455 


1.793. 


This estimate can be used to describe study results in the same metric as a standardized mean difference from a 
between-subjects study and can be combined with estimates from other studies in meta-analysis. 

Although using a bias-corrected effect size typically has little effect in between-subjects studies, this is not true 
for the present example. The estimated autocorrelation in these data is again small: the corrected Yule-Walker 
estimate is cp =0.176. The estimated within-cases variation is a 2 = 198.4; using this estimate and S 2 , we obtain 
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an estimate of 0.91 6 for p. Using the autocorrelation = 0.1 76, the values of the auxiliary constants are A = 0.889, 
6 = 52.162, C=41 .030, and D= 122.725. Inserting p = 0.91 6 and the values of the auxiliary constants B, C, and D into 
(28), we obtain v = 2.340 degrees of freedom. Using ES = 1.793 and inserting 2.340 degrees of freedom into (29), 
we obtain G = 1 .1 50. Inserting the value of the auxiliary constant A and p = 0.91 6 into (27), we obtain the value 
0 = 0.091. Finally, inserting the values v, 0, and <5 = G = 1 .1 50 into (30), we obtain V{G} = 2.440. 

Because estimation of the bias correction and variance involves the nuisance parameters, it is useful to see how 
plausible variation in the estimates of the nuisance parameters might affect the bias correction and the variance 
estimates. The large sample variance of the maximum likelihood estimator of <f> is 0.01 9, which corresponds to a 
standard error of 0.137; a range of plausible values for </> is therefore about —0.10 to 0.45. Using 0 = — 0.10, we 
obtain estimates of v = 2.310 degrees of freedom and a bias-corrected effect size of G= 1.140, with variance 
estimate V{G} = 2.636. In comparison, using <j> = 0.45 leads to v = 2.399 degrees of freedom, G= 1.167, and 
V{G} = 2.140. The impact of plausible variation of the nuisance parameters on the estimate (G) is small, but 
the impact on the variance is not. This is largely because the degrees of freedom are so small; one or two 
additional degrees of freedom would have substantially reduced the influence of the nuisance parameters 
on the variance. It is generally advisable to have at least a few additional case beyond m = 3. 


8. Conclusion 

We have introduced an effect size measure for one kind of effect in single case designs (mean shift between 
baseline and treatment phases) that estimates the same parameter as in the corresponding between-subjects 
design. The estimator itself depends on nuisance parameters only for a bias correction term. Even when nuisance 
parameters are estimated rather crudely, the estimator has relatively small bias. This makes the proposed 
estimation techniques suitable for use even in designs with as few as three independent cases, as in the second 
example earlier. However, the proposed variance estimator tends to have somewhat larger bias, typically being 
underestimated when nuisance parameters are estimated. 

The problem of effect size estimates whose variances depend on nuisance parameters is not unknown in meta- 
analysis. One example is when synthetic composite effect size estimates are created to 'average' or 'difference' 
correlated estimates within studies (see, e.g. [11]). The variance of the composite depends on the correlation 
structure of the estimates, which is typically unknown in detail. In this case, meta-analysts usually construct a 
variance estimate for the composite on the basis of the values of the correlations chosen so that the estimate 
is conservative (i.e. overestimating the variance of the composite effect size). Another example is when effect sizes 
are adjusted for clustering in experiments that involve multilevel sampling (see, e.g. [10]). In this case, meta- 
analysts usually choose values of the intraclass correlation that are designed to be conservative, yielding 
overestimates of the variance of the effect size. Choosing conventional but conservative values of the nuisance 
parameters in this case would be consistent with these precedents. 

If the estimates proposed here are used in meta-analyses, it would be advisable to use at least partially 
empirical variance estimation procedures, such as random effects models. Entirely empirical variance estimates 
for meta-analysis, such as those based on bootstrapping or randomization tests (e.g. [1]) or fully empirical robust 
standard errors [13], could also be used in the meta-analysis. If this approach is taken, the analyst can avoid 
entirely computations of variances of individual effect size estimates (except perhaps for crude decisions about 
weighting). If effect size estimates based on single case designs are included in meta-analyses alongside those 
from between-subjects designs, the latter are likely to receive much higher weight given their typically much 
larger sample sizes. Small biases in the variance of estimates from single case designs may therefore have 
relatively little impact on the overall analysis. 

Our statistical model and estimation procedures assume that the series shows no trend over time. Although 
this assumption appears plausible in many single case designs that we have seen, [23] little research has 
investigated the presence and form of trend empirically. If the researcher doubts the validity of this 
assumption in a particular application, there are several options. The easiest option to execute is to de-trend 
the data with regression, then analyze the residuals. However, de-trending (or other techniques such as taking 
first differences) can implicitly change the underlying modeling assumptions. Our preferred approach would 
be to make such assumptions explicit, by directly specifying a trend component in the statistical model. We 
anticipate that the methods described in this paper can be extended to a model with a time trend that 
is common across both phases and cases. However, more elaborate modeling assumptions, such as time 
trends that differ across phases or vary randomly across cases, may require specifying new and further effect 
size parameters. 

Similarly, our model assumes a first-order autocorrelation process. Again, little research exists about whether 
such an assumption holds, or whether higher order autocorrelations might be present in single case design data. 
However, the first-order assumption is common in this kind of research, exactly because of lack of either evidence 
or strong theory leading us to expect the contrary. This is a fruitful area for future research. 

Future research may also provide improved estimators of the variance of the effect size estimates proposed in 
this paper, as well as of related estimators. For example, our preliminary work suggests that bias corrections may 
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substantially improve estimation of autocorrelations when the total number of observations for each individual is 
small. The use of external information about autocorrelations to stabilize estimates (e.g. via empirical Bayes 
estimation) also has potential to improve variance estimation. 

Finally, the fact that the standard errors of study-level effect sizes can be large, especially when the number 
of cases is minimal, means that study effects may be statistically non-significant. This may dismay researchers, 
especially those who are used to judging effect size visually even when the number of case is small. We can 
say four things to ameliorate such concerns. First, our method is not intended to replace visual analysis but to 
complement it, while at the same time providing some statistical sense of how much confidence in the effect 
should be influenced by sampling error. Second, hopefully, both the present work and emerging standards for 
single case designs (e.g. [15]) will encourage single case design researchers to see the benefits of increasing 
the number of cases in each study. Even relatively small increases are likely to ameliorate the power problem 
greatly. Third, increasing the number of data points within each case will also increase power and may be 
especially helpful when increasing the number of cases is not feasible. Finally, the present method also allows 
single case researchers to aggregate the results of multiple studies on the same question, even when the 
dependent variables are in different metrics, using meta-analytic techniques. The meta-analysis of studies 
using single case designs, whether combined with results from between-subjects experiments or not, will 
greatly increase the power of the resulting effect size estimates (e.g. [21]). 


Appendix A 

The exact expectation and variance of ES can be derived from the moments of D and the inverse moments of 
S 2 . To simplify the notation, we limit this presentation to the balanced (AB) 1 design. First, observe that S 2 is a 
quadratic form of the 2nm-dimensional normal random variable y with covariance matrix E, so that 2n(m-1) 
S 2 = y'Ay = y'A'Ay. Then, it follows that Ay~N(0, AEA). The covariance matrix AEA has 2 n unique nonzero 
eigenvalues 2 q , . . ., X 2n , each repeated m-1 times, and each a function of 4>, p, m, and n. It follows by 
Theorem 3.2b.4 in Mathai and Provost [19] that, for values h<n(m- 1), 

0 

where ex is an arbitrary constant such that 1 1 - 2odj| < 1 for j= 1, . . ., m-1. This expression can be evaluated 
numerically. Because D is independent of S, the moments of ES are the products of the moments of D and the 
inverse moments of S. 


J [l — u(l — 2 cdy)] 2 du 
y=i 


Appendix B 

Here, we give the derivation of the effect size estimator for the general (AB) k design, allowing unequal number of 
time points per phase and per individual. The expected value and variance of D follow from basic properties of 
the multivariate normal distribution. It remains to find the sampling distribution of S 2 . 

Partition the data mM m x 1 vector of observations into 2k subvectors so that 

y' = (y:,--.,yf) 

where the vector y^ is the vector of mM a observations in the o th phase. Further, partition y a . into the observations 
for the M a time points (shared across all individuals) in the a th phase as 

y a . = (yf ,...,yM«'),a» i , - - - , 2 k 

where the vector y° of m observations at the s th time point is given by 

y 5 ~ (a/^-t+s) ; ■ • ■ ’ 5 ^ = 1 , • • ■ j M . 

The complexity of this partition arises for two reasons. First, the number of time points per phase shared across 
all individuals, M a , need not be the same across phases (if it were, we could set M a = M). Secondly, the number of 
time points per individual in each phase is not the same across either individuals or across phases. Moreover, the 
number of time points per individual need not be the same as the number of time points shared across individuals 
in any phase. For example, the first time point in a phase may be the fifth time point overall for individual i but the 
eighth time point for individual j. 
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Partition the mM ' x mM’ covariance matrix I T of y into a 2k x 2k partitioned matrix 

\ 


e t = 




£l(2A-) 




£(2£)1 


£(2*X2*) 


y 

where I.j b is the mM a x mM b matrix of covariances of y a . with y b m . Further, partition each matrix Tj b into a total of 
M a M b submatrices each of dimension mxm as follows 

^ nh wi nh ^ 


y \ab 

Z-t — 


V 


Ef? 


E a \ 



1 M b 




E ab 


E ‘ a \ „ 

M a 1 


M M h 


where Z.°, b is the covariance matrix of y° with yf. Because observations from different individuals are independent, 
each of the T° b is an mxm diagonal matrix. That is, the covariances between any observation in phase a for 
individual / and any observation in phase b for individual j is zero if / / j. Only the covariances between different 
observations on the same individual are non-zero. However, because of the imbalance in the design, the diagonal 
elements of I°t b are not equal. This is because the covariance between observations on the same individual 
depends on the number of observations separating them, but the s th observation in phase a for individual / 
(observation A/, a 1 + s) is separated from the f th observations in phase b for individual i (observation /V, b ~ 1 +f) 


by (A/, 0 1 + s - N- 


b-l 


The matrix l° b can be written as the sum of two diagonal matrices 

rob _ _2. 


f) observations. Because this quantity depends on /, it is different for each individual. 


= T 


—lr\ ab 


where the the / th diagonal element of D° b is the autocorrelation <fi raised to the power d, abst = \ N ° 1 - N b 1 ■ 
that is, 


4 


O ab = diag(yV 


diabst 




, dmabst \ 


Note that = Df S °, because interchanging both a and b and s and t simply changes the sign of the exponent 
within the absolute value. 

The quadratic form S 2 can be written as S 2 = y'A T y//W*(m - 1 ), where the matrix A T can be partitioned 
conformably to I T , that is, partition the mM ’ x mM * matrix A T into a 2k x 2k block diagonal partitioned matrix 


0 


At 


where A° is the mM a x mM a block diagonal matrix. Further, partition each matrix A° into a total of (M a ) 2 
submatrices each of dimension mxm as follows 

\ 



(A, 


0 

A° = 





0 


V 

=n=\ n 

-i m i 

nil m and l m 


1 m 1 m '/m and l m is an mxm identity matrix and 1 m is a m x 1 


vector of 1 's. 


The expected value of M’(m — 1)S 2 , the numerator of S 2 , 

is 

a’eV 


a’eV 2 * 1 

v_ 

II 

1 - 

w 

F— 

4—* 

II 

on 

T 

3 

• 

LLI 





y^2/c^(2/c)1 


^2kji(2<r)(2(r) 


Now, each A°Zt is of form 


A°E? = 


Therefore, the expected value of M’(m - 1)S 2 is 


AE^ 






£LY ab 

1 


ay\ ab 

M°M b 
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S>(A a I 7) = ££tr(AI“) = ££( ff2 + T 2 )(m - 1) = M\m )(a 2 + t 2 ) 

<7= 1 <3=1 S=1 <3=1 5=1 

and the expected value of S 2 = o 2 + t 2 . 

The variance of S 2 can be obtained from the variance of /W*(m-1)S 2 , the numerator of S 2 , as 

V (/W* (m-l)S 2 ) = 2tr(A 7 -H T A 7 -H T ) 


2tr< 




= 2tr 


a'h' t ' 


A'H' t (2 ' [) 

A'n; 1 


A'n; 2,0 ") 






: > 

y^ 2/c^( 2Ac )1 
2k 


y^ 2/c^( 2 k){ 2k ) 

2k 

V 


^ 2k^( 2k)( 2k) 

/ 

\ 


^A'H' t 0 A 0 H' 


2k 

^ A 2 k L‘.2*)° A 0 E 


£A'n' T a A a H 


a (2k) 
T 


2k 

£ p i { - 2k ^ 2k '< a 


a(2k) 

T 


a=1 


so that 


2k 2k 

2tr(A r I T A 7 -I T ) = 2££tr(A b I^A a lf). 

<7=1 6=1 


Now, each A 6 I T bo A a I T ob is of form 


( 


y^ 6 ^Da y^ a ^<76 


Ay ba 

KZ *m» 1 


AH' 


ifco V j\y, nb 


1 M a 


AH 


>b a 

J M b M° 


A 


Ayafa 


AH! 


1M° 


AH afa 

M a M b 



Therefore, 


2k 2k /M° M b 

2tr(A r I T A r I T ) 2 V V V V tr(AI bo AZ° b ) . 


<7=1 6=1 S=1 f=1 


Recall that = r 2 l m + cr 2 D! b , so that 


AI ba AE! t b = A(t 2 I + rr 2 D bo )A(T 2 l + <j 2 D° f) = t 4 A 2 + T 2 a 2 K 2 D a ^ + T 2 tr 2 AD bo A + f7 4 AD ba AD° b . 


Using the facts that A is idempotent and that D° t b = D?° is diagonal, we see that 


tr(AI b0 Al! b ) ={m- 1)t 4 + 2a 2 


m - 1 
m 


Recalling that A = l m - and using the elementary properties of the trace, we obtain an expression for the 

trace of AD? b AD! b in the last term as 


tr(AD bo AD s ob ) 


)-tr([D s f] 2 )-£ r ([D! b ] 2 )+±tr( 


1 m 1 m D b0 1 m 1 m D 5 ab ) = ("^)tr([D! b ] 2 ) +f l[tr(Df)] 2 


Using the fact that Df b is diagonal, we see that 
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tr(D“ fa ) = ^2 < t >d '° bx 
/=1 

and 

m 

tr([D“ b ] 2 ) 

i=i 

so that 



Summing this expression over a = 1,. . .,2k; b = 1,.. .,2k; s = 1,. . ,,M a ; and f = 1,. . ,,M b , then rearranging terms, we obtain 
expression (26). 


Appendix C 


Here, we give formulas for estimators of (j> and p in the unbalanced (AB) k design, of which the (AB) 1 design is a 
special case. Define the sample auto-covariance for case / and phase a as a function of the lag h : 


Nf-h 




l j=N °-'+ 1 


where Y° is the average across observations of the /' th individual within the o th phase given by 

n? 

? <- = n° E Y f 


l j=Nf -'+ 1 


For improved precision, these can be pooled across phases and cases, yielding 


« m 2k 

K h) - w .EE n ^w>)- 


An estimate of <j) is then given by 
where the constant c is given by 


c = 


*-§h 


m 2k / \ m 2k 

EE i 2km-EEV"? 

1=1 a=1 v 1 ' j=1 a=1 

™ J* , , N? — 2km 

EEW-i) 

1=1 0=1 


The use of the c correction makes t}> approximately unbiased when (f> = 0. However, for non-null 4>, the estimate 
remains biased towards zero, particularly so when each phase is short. Note that for balanced designs (in which nf = n 
for /=!,.. ,,m and a= 1,. . .,2k), the constant simplifies to c= 1/n, the correction studied by Huitema and McKean [14]. 

An estimate of a 2 is needed for the purpose of estimating p. The Yule-Walker estimate of rr 2 is simply y*(0), the 
within-phase-within-individual variance (using divisor Nl rather than N’.-2mk). The expected value of y*(0) is 
Ec^/Nl, where 


m 2k 

£ -EE 




It follows that <j 2 = Nlyl(0)/E is an unbiased estimator of tr 2 for known 0. Note that in the balanced (AB) 1 design, 
E/Nl reduces to (1 -£>,). We then estimate p via 
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